Nondestructive monitoring of annealing and chemical–mechanical planarization behavior using ellipsometry and deep learning

The Cu-filling process in through-silicon via (TSV-Cu) is a key technology for chip stacking and three-dimensional vertical packaging. During this process, defects resulting from chemical–mechanical planarization (CMP) and annealing severely affect the reliability of the chips. Traditional methods of defect characterization are destructive and cumbersome. In this study, a new defect inspection method was developed using Mueller matrix spectroscopic ellipsometry. TSV-Cu with a 3-μm-diameter and 8-μm-deep Cu filling showed three typical types of characteristics: overdishing (defect-OD), protrusion (defect-P), and defect-free. The process dimension for each defect was 13 nm. First, the three typical defects caused by CMP and annealing were investigated. With single-channel deep learning and a Mueller matrix element (MME), the TSV-Cu defect types could be distinguished with an accuracy rate of 99.94%. Next, seven effective MMEs were used as independent channels in the artificial neural network to quantify the height variation in the Cu filling in the z-direction. The accuracy rate was 98.92% after training, and the recognition accuracy reached 1 nm. The proposed approach rapidly and nondestructively evaluates the annealing bonding performance of CMP processes, which can improve the reliability of high-density integration.


Introduction
In recent years, three-dimensional (3D) integrated circuit (IC) technology with through-silicon via (TSV) has attracted significant attention because of its versatility, small size, and high performance. 3D IC is a technology that reduces the overall wire length and delay by vertically stacking multiple chips through high-density chip-to-chip interconnects [1][2][3][4][5][6][7] . TSV technology involves several processes, including etching holes in Si chips, depositing insulating/blocking/seeding layers, filling blind holes with Cu conductors, removing the backside Si and Cu overlay films via chemical-mechanical planarization (CMP) to expose Cu microcartridges, and ball bonding 8,9 . Several researchers have investigated the TSV manufacturing process. The microcharacterization and nanocharacterization of various processes have also received considerable attention in improving the reliability of 3D ICs.
Significant issues in TSV technology include residual stress, extrusion, cracking, delamination, and Cu leaks, leading to severe reliability problems, such as warping of chips or wafers, interface delamination, cracking, and weak internal wire contact. These problems mainly result from the significant difference in the coefficient of thermal expansion between the surrounding silicon substrate and the TSV filler metal [10][11][12] . Recently, studies on the Cufilling process in TSV (TSV-Cu) protrusion have been conducted [13][14][15] . In these processes, detailed physical analyses are required because of the considerable impact of TSV-Cu protrusion on the reliability of the final 3D stacking. Many studies have been conducted to investigate annealing Cu protrusion and CMP dishing [16][17][18][19][20][21][22][23] . Cu protrusion and excessive dishing reduce the reliability of 3D stacking. In TSV manufacturing, it is crucial to control the CMP speed, annealing time, and temperature. Insufficient or excessive wafer polishing can lead to leaks and shorts, making the chips defective. Residual stress, interface delamination, and cracking occur when the annealing time or temperature is insufficient or excessive 6,[24][25][26] . However, real-time nondestructive characterization methods for Cu protrusions and dishing remain limited.
Nondestructive characterization of TSV-Cu in a production line has not been extensively investigated. When many defects appear during the TSV process, the detection time obtained using conventional methods is unsatisfactory 5,9 . The period and critical size of the TSV-Cu structure prevail on the microscale for both vertical and horizontal dimensions. However, during CMP and annealing processes, the accuracy of the z-direction of Cu must be controlled at the nanoscale 5,15,24 . Cross-scale defect characterization in the z-direction of TSV-Cu structures is therefore a significant challenge.
In this study, an ellipsometry measurement method was developed for the characterization of TSV-Cu. The rigorous coupled-wave analysis (RCWA) algorithm was employed to calculate the reflection electric field of TSV-Cu with different dimensions of Cu filling in the z-direction at different wavelengths. The corresponding Mueller matrices were calculated by modulating the incident and reflective electric fields, as described in the "Methods" section. Deep learning with Mueller matrix datasets (see the "Methods" section for details) was used to distinguish the different TSV-Cu defects. The effects of the annealing temperature and time on the Cu grain size were experimentally studied, and the defect size of TSV-Cu was quantified using a multichannel deep-learning method. A single Mueller matrix element (MME) was replaced with multiple MMEs as a dataset to improve the stability and accuracy of the quantitative TSV-Cu size.

Results and discussion
Defect classification of TSV-Cu Figure 1a shows an atomic force microscopy image of a 3-μm-diameter and 8-μm-deep TSV-Cu structure after annealing at 250°C for 8.5 h. An optical model was established based on this structure, as shown in Fig. 1b. The model comprised 20 nm Ta as a blocking layer, 200 nm SiO 2 as an insulating layer, a silicon substrate, and a Cu-filling layer with a diameter of 3 μm and depth of 8 μm. The Cu-filling period was 6 μm, the incidence angle was fixed at 45°2 7 , and the wavelength range was  (Fig. 1c). During the annealing process, when the temperature is lower than 200°C, the height of the upper surface of the Cu-filling (H u ) changes due to the significant mismatch of the coefficient of thermal expansion (CTE) between the Cu-filling and silicon substrate. Due to the higher CTE of Cu, the expansion of Cu is greater than that of silicon. At this temperature, there is almost no change in the microstructure of Cu-filling. The recrystallization of Cu-filling occurs at approximately 200-250°C, at which point the Cu grain is refined. Once the annealing temperature exceeds 250°C, the grain grows significantly as driven by the total free energy 3 . H u is positively correlated with the grain size 14 . Therefore, H u increases subsequentially. During the CMP process, the removal of the barrier layer and insulating layer is slower than that of Cu filling. Through this process, H u decreases subsequentially. When H u is excessively large or small, there is a risk of cracks and voids, respectively. Under the effect of poor bonding quality, the electric and thermal resistance of TSV-Cu increase, and the surface bonding strength decreases. Therefore, a reasonable H u process window (−16 to −4 nm) is proposed in this work to achieve void-free and reliable interconnection 7,13 . A schematic of the defect types is shown in Fig. 1d. The upper surface of Si was regarded as the reference plane of 0 nm. When H u exceeded 0 nm or was lower than 4 nm in the z-direction, it was considered a protrusion defect (defect-P). H u ranging between 4 and 16 nm was considered free from defects (defect-free). H u exceeding 16 nm indicated an overdishing defect (defect-OD). Defect-P and defect-OD caused extrusion cracking and incomplete contact during annealing bonding, respectively. The actual process window (4-16 nm) was expected to have a higher upper limit because no increase in the dielectric bond strength was assumed. Figure 2 shows the 4 × 4 MMEs at different wavelengths after ellipsometry and normalization of m 11 . The ordinate of each photograph represents H u , and the abscissa represents the wavelength. The off-diagonal MMEs were nearly zero because the structure of TSV-Cu was isotropic 28 . The sensitivity of the MME, m 12 , with a change in the structure, increased at 0. Mueller matrices can be decomposed using the Lu-Shipman polar decomposition method 29 : where D is the dichroic scalar of the Mueller matrices after polar decomposition, and D % m 12 . Therefore, the diattenuation of TSV-Cu with different H u values showed obvious changes. In defect classification training using m 12 , the verification accuracy at the seventh epoch was 99.80%, and the corresponding cross-entropy loss was 0.00043. Hence, only a single MME was required as a sample set to complete the defect classification using deep learning. t-distributed stochastic neighbor embedding (t-SNE) was employed to analyze the defect types, as shown in Fig. 3c 30 . Different scatter colors represent different defect types, and each point represents the simulation data with 10% noise randomly generated by m 12 (Fig. 2). The separation of the three clusters was more evident (Fig. 3c)  During the TSV process, annealing was applied to make the Cu grains more homogeneous and to release the residual stress. The Cu grain sizes were measured using electron backscatter diffraction at different annealing temperatures and times to show the need for defect quantification and required quantization accuracy (Fig. 4). Higher temperatures and longer times facilitate grain growth, produce Cu protrusions, and release residual stress. Minute grains appeared when the temperature and time were 250°C and 40 h, respectively. Two or more nuclei were formed during the extended annealing times. Therefore, the fusing and growth times of the grains are insufficient when the annealing time is extremely short. When the annealing time was very long, small grains were formed owing to repeated nuclei precipitation, leading to inadequate annealing. When annealed at 250°C, the average Cu grain size increased by 60 nm from 9 to 40 h. Therefore, it is necessary to adjust the annealing time and temperature to control H u during annealing bonding. The proposed metrology method should be able to distinguish 10% of the process window to effectively monitor the TSV-Cu process. In this case, the recognition accuracy in the z-direction should exceed 1.2 nm to more accurately characterize H u in TSV-Cu and be defect-free within the 12 nm process window. In this section, we attempted to increase the recognition accuracy to 1 nm.
The training and testing accuracy and stability of deep learning must be improved to achieve a distinguishing accuracy level of 1 nm. In this study, a multichannel deep learning method was applied, in which seven effective MMEs were utilized. Because the actual characterization system causes random errors, random noise was added to the MMEs of each structure. Figure 5a shows the crossentropy loss of a single channel (one MME) and multiple channels (seven MMEs) under 10% random noise, and Fig. 5b depicts the validation accuracy. The training object consisted of 61 labels, including H u ranging from -30 to 30 (in intervals of 1 nm). The convergence rate of the multichannel training cross-entropy loss and validation accuracy was faster than that of single-channel training. Figure 5c depicts the test accuracy of the single-channel and multichannel methods with different random noises. The test accuracy of the single channel was significantly influenced by the increase in noise, whereas its influence on the multichannel was less. When the random noise increased from 10% to 30%, the test accuracy of the multichannel changed by only 1.01%. When the random noise increased to 30%, the multichannel deep-learning test accuracy reached 98.92%. t-SNE was employed to quantify the size to intuitively evaluate the difference in the TSV-Cu size (Fig. 5d). In the t-SNE scatter plot, clusters with good separation indicate that the TSV-Cu size can be distinguished clearly, whereas the adjacent and overlapping clusters indicate very similar datasets. The effects of different morphologies and different measurement conditions on the measurement sensitivity were discussed using seven MMEs (Fig. 6). One such effect is the aspect ratio of TSV-Cu. The diameter of the Cu-filling is 3 μm, and the aspect ratios are 0.5, 1, 2, 4, 6, 8, and 10 ( Fig. 6a). Another such effect is the critical dimension difference between the top and bottom (Δ Top-Bottom ) of the Cu-filling. The diameter of the Cu-filling is 3 μm, the aspect ratio is 1, and the Δ Top-Bottom values are 2 μm, 1.6 μm, 1.2 μm, 0.8 μm, 0.4 μm, 0.2 μm, and 0 μm (Fig.  6b). The third is that the azimuth angles of incident light include 0°, 30°, 60°, 90°, 120°, 150°, and 180° (Fig. 6c). As seen from Table 1, with the change in the TSV morphologies and measurement conditions, the test accuracies fluctuate by no more than 1%. In addition, all the cross-entropy losses and validation accuracies converge to 0 and 1, respectively, as shown in Fig. 6. Hence, multichannel deep learning using seven MMEs can quantify the H u of TSV-Cu with a quantization accuracy of 1 nm and is not affected by random errors, aspect ratios, Δ Top-Bottom or azimuths.

Conclusion
The polarization of light is highly sensitive to TSV-Cu particles of different sizes. In this study, three defect process windows of TSV were selected, and the Mueller matrix of TSV-Cu was calculated using Fourier series and ellipsometry. The effective MME and single-channel deep learning distinguished the defect types of TSV-Cu with an accuracy of more than 99%. Multichannel deep learning based on seven MMEs quantified the H u of TSV-Cu and achieved a recognition accuracy level of 1 nm. Random noise significantly impacts quantization accuracy, reaching 98.92% when random noise increases to 30%. Multichannel deep learning is more accurate and stable and less affected by random noise, aspect ratios, Δ Top-Bottom , and azimuths, which can improve the reliability of high-density integration in micro-and nanofabrication technologies.

Optical simulation and calculation of the Mueller matrix
The modulation method of the incident optical wave can be considered natural light transmitting a stationary linear polarizer and a quarter-wave plate in turn, and the fast-axis rotation angle is θ. The Stokes vector of natural light is S 0 ¼ In addition, the polarization angle and phase difference of the incident optical wave can be expressed as follows: where Sign{x} = 1 if x > 1, Sign{x} = 0 if x = 0, and Sign{x} < 0 if x < 0. Ψ is the polarization angle of the incident optical wave, where Ψ = 0°is p-polarized and Ψ = 90°is s-polarized. Φ is the phase difference between the incident optical waves, and θ is the fast-axis rotation angle of the first quarter-wave plate. The reflected electric fields, E x and E y , and the corresponding phases of incident light after passing through TSV-Cu were calculated using the RCWA algorithm 31    where δ is the phase difference between E x and E y . The light reflected by the sample is remodulated as S out : where M A is the Mueller matrix of the analyzer, M A ¼ M P . M C2 is the Mueller matrix of the second quarter-wave plate, M C2 ¼ M C1 Á S out1 . is the total light intensity detected by the detector. In this study, S out1 was expanded using the Fourier series. Because the rotation angle ratio of the incident and outgoing quarter-wave plates was 5:3, the highest order of the Fourier series was 32. Sixteen MMEs of TSV-Cu were calculated, and all elements were normalized to m 11 32 .

Deep learning
In this study, we developed a machine learning approach to classify defects. The convolutional neural network (CNN) was employed, matching a fully connected network (FCN) (Fig. 7). The "leaky ReLU" activation function, a preferred method for solving pattern recognition problems, was used, and it consisted of three layers 33,34 . Each CNN layer was connected to a max-pooling layer with batch normalization. The number of filters and kernel size of each channel in each CNN layer are listed in Table 2. The FCN consisted of two layers with 64 and 32 neurons and utilized "tanh" activations. Mueller matrices of different H u at different wavelengths were introduced into the ANN input layer as parallel channels, and the kernel size of each channel was seven. Each defect type was assigned to a neuron in the ANN output layer. During the training of the ANN, a Mueller matrix was introduced into the ANN input layer, and different defect types were introduced into the output layer. Hundreds or tens of sample sets for each group of structures were not produced because of the cost and time in the actual TSV-Cu production process. During the process of sample measurement, there are two random errors, including the sample manufacturing error and measurement error 35,36 . The manufacturing error of surface displacement fluctuation of 0.1 nm is inevitable in the TSV-Cu process 23 . It is difficult to obtain specific measurement error because the measurement error depends on the instrument and environment. In practice, the spectral change caused by measurement error is usually smaller than that caused by manufacturing error. The spectrum of TSV-Cu surface fluctuation (H u ± 0.1 nm) was analyzed, and all effective spectral change rates were less than 10% (Fig. 8). It can be seen from Fig. 8 that only a small number of spectral change rates are close to 10%. Therefore, 60 structures were simulated, each with a Cu pillar size interval of 1 nm in the z-direction, and 300 sets of 10% noise from the Mueller matrix of each structure at different wavelengths were generated as the training dataset. Seventy percent of the Mueller matrices were used for training. The remaining 30% were used for testing. In the readout scheme, the Mueller matrix was fed into the ANN after training and propagated forward through the network to retrieve the defect types distinguished by the z-direction structure. Finally, the cross-entropy loss and validation accuracy were determined to verify the training quality of the ANN.